Grammar of Tables

Jack Davis

2024-08-09

Packages

library(gt)
## Warning: package 'gt' was built under R version 4.3.3
library(gtExtras)
## Warning: package 'gtExtras' was built under R version 4.3.3
library(dplyr) # for pipeline stuff
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(htmltools) # for tagList
## Warning: package 'htmltools' was built under R version 4.3.3
library(svglite) # for sparkline plots
## Warning: package 'svglite' was built under R version 4.3.3
library(webshot2) # for saving a gt object as a png
## Warning: package 'webshot2' was built under R version 4.3.3

What is GT (grammar of tables)

Visualization isn’t just graphs, it also includes tables. Today we’re going to look at a package that uses the grammar of graphics principles of working in layers and treating graphs as objects that can be added onto, and applying them to tables.

(Diagram from: https://towardsdatascience.com/exploring-the-gt-grammar-of-tables-package-in-r-7fff9d0b40cd and https://gt.rstudio.com/ )

Basic GT tables

(Copied from https://r-graph-gallery.com/package/gt.html with some added commentary)

First, we’ll set up some data. TO DO: Replace with sports data.

# Create a simple data frame
data = data.frame(
  Country = c("USA", "China", "India", "Brazil"),
  Capitals = c("Washington D.C.", "Beijing", "New Delhi", "Brasília"),
  Population = c(331, 1441, 1393, 212),
  GDP = c(21.43, 14.34, 2.87, 1.49)
)

Then we’ll put that data.frame into a gt() table just as we would do a with a ggplot(). The pipeline operator %>% means “take what’s on the left, and place it into the first argument in the function on the right. With tidyverse functions like ggplot and gt, the first argument is always the dataset.

# Alternatively you can do (same output):
#gt(data)

# Use the gt function
data %>%
  gt()
Country Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49

Basic GT tables - Titles

Titles can be added, and markdown text editing can be applied to the title and subtitle as long as you have the md() function wrapping around the text.

Notice the markdown, that the double stars ** indicate bold, the single stars * indicate italics, and the the single slanted quotations \`` indicate a code block.

data %>%
  gt() %>%
    tab_header(title = md("What a **nice title**"),
               subtitle = md("Pretty *cool subtitle* too, `isn't it?`"))
What a nice title
Pretty cool subtitle too, isn't it?
Country Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49

Basic GT tables - Titles

Notice that we don’t need to specify the data again, we technically don’t even need to specify the table again if we save the table as an object like we would with a graph.

basic_gt <- data %>% gt()

basic_gt %>%
    tab_header(title = "Basic text can go here too",
               subtitle = md("You *need* to have the `md()` wrapper to do more"))
Basic text can go here too
You need to have the md() wrapper to do more
Country Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49

Basic GT tables - Titles

We can also use HTML instead of markdown. There isn’t a way to change the colour of text like this with markdown, unfortunately.

# create and display the gt table 
data %>%
  gt() %>%
    tab_header(title = html("<span style='color:red;'>A red title</span>"))
A red title
Country Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49

Basic GT tables - Titles

We can add images too, using HTML. (It doesn’t matter if you use html() or HTML())

data %>%
  gt() %>%
    tab_header(title = html("<span style='color:red;'>A <strong>red</strong> title</span>"),
               subtitle = tagList(
                 tags$div(style = css(`text-align` = "center"),
                          HTML(web_image("https://www.r-project.org/logo/Rlogo.png")
                     )
                   )
                 )
               )
A red title
Country Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49

Basic GT tables - Titles

You can include an image using markdown as well.

data %>%
  gt() %>%
    tab_header(title = md("![](uwaggsLogo.png){width=30%}"))
Country Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49

Basic GT tables - Footers

We can also use HTML instead of markdown. There isn’t a way to change the colour of text like this with markdown, unfortunately.

tab_footer() behaves a lot like tab_header(), but for the bottom of the table.

data %>%
  gt() %>%
    tab_footnote(footnote = md("This text is the footer of this **table**"))
Country Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49
This text is the footer of this table

Basic GT tables - Footers

data2 = data.frame(
  Planet = c("Earth", "Mars", "Jupiter", "Venus"),
  Moons = c(1, 2, 79, 0),
  Distance_from_Sun = c(149.6, 227.9, 778.3, 108.2),
  Diameter = c(12742, 6779, 139822, 12104)
)

We can do multiple footnotes, and we can specify for all footnotes whether the superscripts should be numbers, letters, LETTERS, standard symbolic marks, or extended symbolic marks.

data2 %>%
  gt() %>%
    tab_footnote(footnote = md("Measured in **millions** of Km"),
                 locations = cells_column_labels(columns = Distance_from_Sun)) %>%
    tab_footnote(footnote = md("Measured in **Km**"),
                 locations = cells_column_labels(columns = Diameter)) %>%
    tab_footnote(footnote = md("The original data are from *Some Organization*")) %>%
    opt_footnote_marks(marks = "LETTERS")
Planet Moons Distance_from_SunA DiameterB
Earth 1 149.6 12742
Mars 2 227.9 6779
Jupiter 79 778.3 139822
Venus 0 108.2 12104
The original data are from Some Organization
A Measured in millions of Km
B Measured in Km
data2 %>%
  gt() %>%
    tab_footnote(footnote = md("Measured in **millions** of Km"),
                 locations = cells_column_labels(columns = Distance_from_Sun)) %>%
    tab_footnote(footnote = md("Measured in **Km**"),
                 locations = cells_column_labels(columns = Diameter)) %>%
    tab_footnote(footnote = md("The original data are from *Some Organization*")) %>%
    opt_footnote_marks(marks = "extended")
Planet Moons Distance_from_Sun* Diameter†
Earth 1 149.6 12742
Mars 2 227.9 6779
Jupiter 79 778.3 139822
Venus 0 108.2 12104
The original data are from Some Organization
* Measured in millions of Km
† Measured in Km

Basic GT tables - Footers

We can also refer to certain elements with headers and footers.

data %>%
  gt() %>%
    tab_footnote(footnote = md("English name"),
                 locations = cells_column_labels(columns = Country))
Country1 Capitals Population GDP
USA Washington D.C. 331 21.43
China Beijing 1441 14.34
India New Delhi 1393 2.87
Brazil Brasília 212 1.49
1 English name

Basic GT tables - Spanners

We can also add things like superheaders for titles when we want to group certain columns together. Notice here that the order in which we add the tab_spanner() elements doesn’t matter; we can add the two on the right before we add the two on the left.

basic_gt %>%
   tab_spanner(
    label = "Number",
    columns = c(GDP, Population)) %>%
  tab_spanner(
    label = "Label",
    columns = c(Country, Capitals)
  )
Label Number
Country Capitals GDP Population
USA Washington D.C. 21.43 331
China Beijing 14.34 1441
India New Delhi 2.87 1393
Brazil Brasília 1.49 212

Basic GT tables - Spanners

Notice here that the Capitals column has two labels. The most recently added tab_spanner goes on the top in that case. Also notice that the columns selected do not need to be contiguous. If you need to rearrange the variables themselves, that can be done to the original data.frame.

basic_gt %>%
   tab_spanner(
    label = "Label One",
    columns = c(GDP, Capitals)) %>%
  tab_spanner(
    label = "Label Two",
    columns = c(Country, Capitals)
  )
Label Two Label Two
Country Population Label One
GDP Capitals
USA 331 21.43 Washington D.C.
China 1441 14.34 Beijing
India 1393 2.87 New Delhi
Brazil 212 1.49 Brasília

GT Extras

Taken from https://r-graph-gallery.com/368-plotting-in-cells-with-gtextras.html

If you want to put charts in the cells of a table, you necessarily need to aggregate your data at some point. This is because you can’t really create any chart with a single value.

In this post, we’ll work with the iris dataset. This dataset has 5 columns: 4 quantitative and 1 qualitative (Species, with 3 distinct labels). A simple way to aggregate these data is to group by the Species column.

It implies that we will have a new dataset with 3 rows and as much column. Now that we have grouped by the dataset, we need to use an aggregation measure for the quantitative columns. And because we want to create chart for those cols, the aggregation will be the list of all values for the given species.

Here’s how to do it:

# load the dataset
data(iris)
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
tail(iris)
##     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
## 145          6.7         3.3          5.7         2.5 virginica
## 146          6.7         3.0          5.2         2.3 virginica
## 147          6.3         2.5          5.0         1.9 virginica
## 148          6.5         3.0          5.2         2.0 virginica
## 149          6.2         3.4          5.4         2.3 virginica
## 150          5.9         3.0          5.1         1.8 virginica

This will create a list of 50 elements in each cell of the data.frame including the 50 values for that variable for that species.

# create aggregated dataset
agg_iris = iris %>%
  group_by(Species) %>%
  summarize(
    Sepal.L = list(Sepal.Length),
    Sepal.W = list(Sepal.Width),
    Petal.L = list(Petal.Length),
    Petal.W = list(Petal.Width)
    )

GT Extras

This will take that data frame of lists and make it a gt table.

# display the table with default output with gt package
agg_iris %>%
  gt()
Species Sepal.L Sepal.W Petal.L Petal.W
setosa 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, 4.6, 5.1, 4.8, 5.0, 5.0, 5.2, 5.2, 4.7, 4.8, 5.4, 5.2, 5.5, 4.9, 5.0, 5.5, 4.9, 4.4, 5.1, 5.0, 4.5, 4.4, 5.0, 5.1, 4.8, 5.1, 4.6, 5.3, 5.0 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, 3.6, 3.3, 3.4, 3.0, 3.4, 3.5, 3.4, 3.2, 3.1, 3.4, 4.1, 4.2, 3.1, 3.2, 3.5, 3.6, 3.0, 3.4, 3.5, 2.3, 3.2, 3.5, 3.8, 3.0, 3.8, 3.2, 3.7, 3.3 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, 1.0, 1.7, 1.9, 1.6, 1.6, 1.5, 1.4, 1.6, 1.6, 1.5, 1.5, 1.4, 1.5, 1.2, 1.3, 1.4, 1.3, 1.5, 1.3, 1.3, 1.3, 1.6, 1.9, 1.4, 1.6, 1.4, 1.5, 1.4 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, 0.2, 0.5, 0.2, 0.2, 0.4, 0.2, 0.2, 0.2, 0.2, 0.4, 0.1, 0.2, 0.2, 0.2, 0.2, 0.1, 0.2, 0.2, 0.3, 0.3, 0.2, 0.6, 0.4, 0.3, 0.2, 0.2, 0.2, 0.2
versicolor 7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7 3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8 4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1 1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3
virginica 6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9 3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0 6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1 2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8

GT Extras - Plots

The gt_plt_sparkline() function creates a line chart in table cells. It requires to add a line of code for each column you want to display. In our case, it means one line of code for each column.

In a sparkline, the highest value and the lowest value are marked, and the rest is just a line plot of the values.

# Needs svglite package
agg_iris %>%
  gt() %>%
  gt_plt_sparkline(Sepal.L) %>%
  gt_plt_sparkline(Sepal.W) %>%
  gt_plt_sparkline(Petal.L) %>%
  gt_plt_sparkline(Petal.W)
Species Sepal.L Sepal.W Petal.L Petal.W
setosa 5.0 3.3 1.4 0.20
versicolor 5.7 2.8 4.1 1.3
virginica 5.9 3.0 5.1 1.8

GT Extras - Plots

We can also look at distribution plots of these lists with the gt_plt_dist function, in which we specify the variable name, and the type of plot. All four options for density plots are shown here: density (KDE), boxplot, histogram, and rug_strip.

Other plotting functions include:

agg_iris %>%
  gt() %>% 
  gt_plt_dist(Sepal.L, type = "density") %>%
  gt_plt_dist(Sepal.W, type = "boxplot") %>%
  gt_plt_dist(Petal.L, type = "histogram") %>%
  gt_plt_dist(Petal.W, type = "rug_strip")
Species Sepal.L Sepal.W Petal.L Petal.W
setosa
versicolor
virginica

GT Extras - Plots

The gt_plt_bar_pct() does not require aggregate data. The chart is actually a score bar that measures how close the value in the cell is to the maximum value in that column OF THE VALUES THAT ARE INCLUDED IN THE TABLE.

This means that the highest value in the table has its bar full.

Notice here that we’re only using the first six rows of data, so the percentages are of the maximum of the first six rows, not the whole thing.

head(iris) %>%
  gt() %>%
  gt_plt_bar_pct(Sepal.Length, labels = TRUE) %>%
  gt_plt_bar_pct(Sepal.Width, labels=FALSE, fill = "forestgreen")
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
94.4%
1.4 0.2 setosa
90.7%
1.4 0.2 setosa
87%
1.3 0.2 setosa
85.2%
1.5 0.2 setosa
92.6%
1.4 0.2 setosa
100%
1.7 0.4 setosa

GT Extras - Plots

The gt_plt_summary() can, in just one single line of code, summarizes your entire dataset.

If you have correctly specified the data type in your dataframe, it will automatically aggregate columns and display the right chart for them!

Notice that all 150 rows of iris are going into this, and that we’re not doing anything to aggregate them beforehand.

iris %>%
  gt_plt_summary()
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_point(data = NULL, aes(x = rng_vals[1], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_point(data = NULL, aes(x = rng_vals[2], y = 1), color = "transparent", : All aesthetics have length 1, but the data has 150 rows.
## ℹ Did you mean to use `annotate()`?
.
150 rows x 5 cols
Column Plot Overview Missing Mean Median SD
Sepal.Length 4.37.9 0.0% 5.8 5.8 0.8
Sepal.Width 2.04.4 0.0% 3.1 3.0 0.4
Petal.Length 1.06.9 0.0% 3.8 4.3 1.8
Petal.Width 0.12.5 0.0% 1.2 1.3 0.8
Species setosa, versicolor and virginica
3 categories 0.0% — — —

GT Extras - Themes

(Code taken from the help files of gtExtras)

We can apply several premade themes using gt_theme_??? where ??? is any of several options.

This includes the ESPN website style:

themed_tab <- head(mtcars) %>%
  gt() %>%
  gt_theme_espn()

themed_tab
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

GT Extras - Themes

A dot-matrix printer style with fixed width text.

themed_tab2 <- head(mtcars) %>%
  gt() %>%
  gt_theme_dot_matrix()

themed_tab2
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

GT Extra - Themes

Or the New York Times style of publishing tables

themed_tab3 <- head(mtcars) %>%
  gt() %>%
  gt_theme_nytimes()

themed_tab3
mpg cyl disp hp drat wt qsec vs am gear carb
21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Other Themes include:

GT Case Study - Palmer Penguins

Taken from: https://towardsdatascience.com/exploring-the-gt-grammar-of-tables-package-in-r-7fff9d0b40cd

Notice how gt() respects the group_by function by adding spacer rows in between islands.

penguins = read.csv("Penguins_clean.csv")
pendata <- penguins %>%
          tibble() %>% # convert from data.frame to tidyverse tibble
          filter(species == "Adelie") %>%
          group_by(island, year) %>%
          summarise_if(is.numeric, sum)

pendata
## # A tibble: 9 × 10
## # Groups:   island [3]
##   island     year bill_length_mm_female bill_length_mm_male bill_depth_mm_female
##   <chr>     <int>                 <dbl>               <dbl>                <dbl>
## 1 Biscoe     2007                  187.                196.                 92.9
## 2 Biscoe     2008                  330.                367.                155  
## 3 Biscoe     2009                  305.                330.                142. 
## 4 Dream      2007                  341.                404.                161. 
## 5 Dream      2008                  290.                321.                142. 
## 6 Dream      2009                  366.                397.                173. 
## 7 Torgersen  2007                  306.                279.                145. 
## 8 Torgersen  2008                  293.                327.                139. 
## 9 Torgersen  2009                  302.                327.                137. 
## # ℹ 5 more variables: bill_depth_mm_male <dbl>, flipper_length_mm_female <int>,
## #   flipper_length_mm_male <int>, body_mass_g_female <dbl>,
## #   body_mass_g_male <dbl>
pentab = pendata %>% gt()

pentab
year bill_length_mm_female bill_length_mm_male bill_depth_mm_female bill_depth_mm_male flipper_length_mm_female flipper_length_mm_male body_mass_g_female body_mass_g_male
Biscoe
2007 187.4 195.8 92.9 91.5 909 908 173.50 188.50
2008 329.8 366.8 155.0 171.3 1679 1733 292.00 361.00
2009 304.7 330.4 141.6 156.0 1530 1548 275.75 341.50
Dream
2007 340.7 403.8 160.6 194.3 1665 1886 294.25 410.25
2008 290.2 320.9 142.3 151.1 1512 1560 273.00 328.00
2009 365.7 397.3 172.8 182.1 1895 1928 335.75 394.50
Torgersen
2007 306.2 279.3 145.3 143.4 1501 1346 278.00 289.75
2008 292.9 327.4 139.2 150.7 1520 1548 281.50 335.50
2009 302.2 326.8 136.7 151.9 1498 1589 255.50 302.75

GT Case Study - Palmer Penguins

In library(gt), the data_color function adds colours to data cells.

pentab2 <- pentab %>% data_color(palette = "PuOr")
pentab2
year bill_length_mm_female bill_length_mm_male bill_depth_mm_female bill_depth_mm_male flipper_length_mm_female flipper_length_mm_male body_mass_g_female body_mass_g_male
Biscoe
2007 187.4 195.8 92.9 91.5 909 908 173.50 188.50
2008 329.8 366.8 155.0 171.3 1679 1733 292.00 361.00
2009 304.7 330.4 141.6 156.0 1530 1548 275.75 341.50
Dream
2007 340.7 403.8 160.6 194.3 1665 1886 294.25 410.25
2008 290.2 320.9 142.3 151.1 1512 1560 273.00 328.00
2009 365.7 397.3 172.8 182.1 1895 1928 335.75 394.50
Torgersen
2007 306.2 279.3 145.3 143.4 1501 1346 278.00 289.75
2008 292.9 327.4 139.2 150.7 1520 1548 281.50 335.50
2009 302.2 326.8 136.7 151.9 1498 1589 255.50 302.75

GT Case Study - Palmer Penguins

We can specify columns (and rows), and use any one of 25 palettes available from RColorBrewer or viridis. (Or make our own if we have a team theme to go for)

See ?data_color for the rest of the available palettes.

pentab3 <- pentab %>% data_color(palette = "Spectral",
                              columns = c(bill_length_mm_male, bill_depth_mm_male, 
                            flipper_length_mm_male, body_mass_g_male,
                            bill_length_mm_female, bill_depth_mm_female,
                            flipper_length_mm_female, body_mass_g_female))

pentab3
year bill_length_mm_female bill_length_mm_male bill_depth_mm_female bill_depth_mm_male flipper_length_mm_female flipper_length_mm_male body_mass_g_female body_mass_g_male
Biscoe
2007 187.4 195.8 92.9 91.5 909 908 173.50 188.50
2008 329.8 366.8 155.0 171.3 1679 1733 292.00 361.00
2009 304.7 330.4 141.6 156.0 1530 1548 275.75 341.50
Dream
2007 340.7 403.8 160.6 194.3 1665 1886 294.25 410.25
2008 290.2 320.9 142.3 151.1 1512 1560 273.00 328.00
2009 365.7 397.3 172.8 182.1 1895 1928 335.75 394.50
Torgersen
2007 306.2 279.3 145.3 143.4 1501 1346 278.00 289.75
2008 292.9 327.4 139.2 150.7 1520 1548 281.50 335.50
2009 302.2 326.8 136.7 151.9 1498 1589 255.50 302.75

Grouping variables

(Heavily modified from https://themockup.blog/static/resources/gt-cookbook.html )

We can use a variable in the dataset as a grouping variable to put break lines into the output table when outputting a dataset as a table.

To demonstrate this, first, let’s take a few rows and columns from mtcars.

mtcars_mini = mtcars[1:10, c("cyl","gear","hp","wt","mpg")]
mtcars_mini %>% gt() 
cyl gear hp wt mpg
6 4 110 2.620 21.0
6 4 110 2.875 21.0
4 4 93 2.320 22.8
6 3 110 3.215 21.4
8 3 175 3.440 18.7
6 3 105 3.460 18.1
8 3 245 3.570 14.3
4 4 62 3.190 24.4
4 4 95 3.150 22.8
6 4 123 3.440 19.2

Now let’s use the groupname_col option in gt to establish a column as the group name. Notice that cyl disappears from the column and appears as breaker. (adding a group_by() line before gt() does the same thing)

mtcars_mini %>% 
  gt(groupname_col = "cyl")
gear hp wt mpg
6
4 110 2.620 21.0
4 110 2.875 21.0
3 110 3.215 21.4
3 105 3.460 18.1
4 123 3.440 19.2
4
4 93 2.320 22.8
4 62 3.190 24.4
4 95 3.150 22.8
8
3 175 3.440 18.7
3 245 3.570 14.3

Grouping variables

We can also make the grouping variable a character variable and it works the same way. Notice that the dplyr function mutute changes the values of the variable, not the name of the variable, so we can see refer to cyl in the gt() line.

mtcars_mini %>% 
   mutate(cyl = paste(cyl, "Cylinders"))
##                           cyl gear  hp    wt  mpg
## Mazda RX4         6 Cylinders    4 110 2.620 21.0
## Mazda RX4 Wag     6 Cylinders    4 110 2.875 21.0
## Datsun 710        4 Cylinders    4  93 2.320 22.8
## Hornet 4 Drive    6 Cylinders    3 110 3.215 21.4
## Hornet Sportabout 8 Cylinders    3 175 3.440 18.7
## Valiant           6 Cylinders    3 105 3.460 18.1
## Duster 360        8 Cylinders    3 245 3.570 14.3
## Merc 240D         4 Cylinders    4  62 3.190 24.4
## Merc 230          4 Cylinders    4  95 3.150 22.8
## Merc 280          6 Cylinders    4 123 3.440 19.2
mtcars_mini %>% 
  mutate(cyl = paste(cyl, "Cylinders")) %>% 
  gt(groupname_col = "cyl")
gear hp wt mpg
6 Cylinders
4 110 2.620 21.0
4 110 2.875 21.0
3 110 3.215 21.4
3 105 3.460 18.1
4 123 3.440 19.2
4 Cylinders
4 93 2.320 22.8
4 62 3.190 24.4
4 95 3.150 22.8
8 Cylinders
3 175 3.440 18.7
3 245 3.570 14.3

Grouping variables

We can use multiple grouping variables too, where later-listed variables are treated as subgroups.

(See https://themockup.blog/static/resources/gt-cookbook.html#custom-groups for information on custom groups like splitting a continuous variable, or by the first letter of a name.)

mtcars_mini %>% 
  mutate(cyl = paste(cyl, "Cylinders"), gear = paste(gear, "Gears")) %>% 
  gt(groupname_col = c("cyl","gear"))
hp wt mpg
6 Cylinders - 4 Gears
110 2.620 21.0
110 2.875 21.0
123 3.440 19.2
4 Cylinders - 4 Gears
93 2.320 22.8
62 3.190 24.4
95 3.150 22.8
6 Cylinders - 3 Gears
110 3.215 21.4
105 3.460 18.1
8 Cylinders - 3 Gears
175 3.440 18.7
245 3.570 14.3

Group variables

We can arrange to put the groups on the side instead of as inserts by using rowname_col instead of groupname_col. This only seems to work with a single variable (but I may be wrong)

mtcars_mini %>% 
  mutate(cyl = paste(cyl, "Cylinders")) %>% 
  gt(rowname_col = "cyl")
gear hp wt mpg
6 Cylinders 4 110 2.620 21.0
6 Cylinders 4 110 2.875 21.0
4 Cylinders 4 93 2.320 22.8
6 Cylinders 3 110 3.215 21.4
8 Cylinders 3 175 3.440 18.7
6 Cylinders 3 105 3.460 18.1
8 Cylinders 3 245 3.570 14.3
4 Cylinders 4 62 3.190 24.4
4 Cylinders 4 95 3.150 22.8
6 Cylinders 4 123 3.440 19.2

Saving gt objects

You can save the output of a gt object as HTML or LaTeX code, or as a PNG or PDF image, or at RTF (Rich text format) output. gtsave() will guess what format you want from the file name.

mtcars_famcy <- mtcars %>% 
  mutate(cyl = paste(cyl, "Cylinders")) %>% 
  gt(rowname_col = "cyl")

gtsave(mtcars_famcy, "mtcars.pdf")
gtsave(mtcars_famcy, "mtcars.html")
gtsave(mtcars_famcy, "mtcars.png") # requires webshot2 package

Conditional formatting

gt() can do conditional formatting too.

stocks <- data.frame(
  Symbol = c("GOOG", "FB", "AMZN", "NFLX", "TSLA"),
  Price = c(1265.13, 187.89, 1761.33, 276.82, 328.13),
  Change = c(4.14, 1.51, -19.45, 5.32, -12.45)
)
stocks %>% gt()
Symbol Price Change
GOOG 1265.13 4.14
FB 187.89 1.51
AMZN 1761.33 -19.45
NFLX 276.82 5.32
TSLA 328.13 -12.45

Conditional formatting

This is done with tab_style, which has a few settings you need.

stocks %>% 
  gt() %>% 
  tab_style(
    style = cell_text(color = "red", weight = "bold"),
    locations = cells_body(
      columns = c(Change, Price),
      rows = Change < 0)) %>% 
  tab_style(
    style = cell_text(color = "blue", style = "italic"),
    locations = cells_body(
      columns = c(Change, Price),
      rows = Change >= 0))
Symbol Price Change
GOOG 1265.13 4.14
FB 187.89 1.51
AMZN 1761.33 -19.45
NFLX 276.82 5.32
TSLA 328.13 -12.45

Table customization

With opt_... we can apply table customization options.

mtcars_mini %>% 
  gt() %>% 
  opt_table_lines("all")
cyl gear hp wt mpg
6 4 110 2.620 21.0
6 4 110 2.875 21.0
4 4 93 2.320 22.8
6 3 110 3.215 21.4
8 3 175 3.440 18.7
6 3 105 3.460 18.1
8 3 245 3.570 14.3
4 4 62 3.190 24.4
4 4 95 3.150 22.8
6 4 123 3.440 19.2

Table customization

With opt_... we can apply table customization options.

mtcars_mini %>% 
  gt() %>% 
   opt_table_outline()
cyl gear hp wt mpg
6 4 110 2.620 21.0
6 4 110 2.875 21.0
4 4 93 2.320 22.8
6 3 110 3.215 21.4
8 3 175 3.440 18.7
6 3 105 3.460 18.1
8 3 245 3.570 14.3
4 4 62 3.190 24.4
4 4 95 3.150 22.8
6 4 123 3.440 19.2

Table customization

We can even use google_font to pull down fonts from Google and use those. See https://fonts.google.com/ for demos.

mtcars_mini %>% 
  gt() %>% 
opt_table_font(font = list(google_font(name = "Merriweather"), "Cochin", "Serif"))
cyl gear hp wt mpg
6 4 110 2.620 21.0
6 4 110 2.875 21.0
4 4 93 2.320 22.8
6 3 110 3.215 21.4
8 3 175 3.440 18.7
6 3 105 3.460 18.1
8 3 245 3.570 14.3
4 4 62 3.190 24.4
4 4 95 3.150 22.8
6 4 123 3.440 19.2

Table customization

Other table option functions include:

Extenal images

We can bring in external images like team logos with with gt_img_rows. First we need the URLs of the images.

teams <- "https://github.com/nflverse/nflfastR-data/raw/master/teams_colors_logos.rds"
team_df <- readRDS(url(teams)) 

team_df <- tail(team_df) %>% 
  dplyr::select(team_wordmark, team_abbr, logo = team_logo_espn, team_name:team_conf)
team_df
## # A tibble: 6 × 7
##   team_wordmark            team_abbr logo  team_name team_id team_nick team_conf
##   <chr>                    <chr>     <chr> <chr>     <chr>   <chr>     <chr>    
## 1 https://github.com/nflv… SEA       http… Seattle … 4600    Seahawks  NFC      
## 2 https://github.com/nflv… SF        http… San Fran… 4500    49ers     NFC      
## 3 https://github.com/nflv… STL       http… St. Loui… 2510    Rams      NFC      
## 4 https://github.com/nflv… TB        http… Tampa Ba… 4900    Buccanee… NFC      
## 5 https://github.com/nflv… TEN       http… Tennesse… 2100    Titans    AFC      
## 6 https://github.com/nflv… WAS       http… Washingt… 5110    Commande… NFC

Extenal images

We can bring in external images like team logos with with gt_img_rows.

We do this by specifying the columns to place the image, height (or width) to make sure it fits nicely in the table, and possibly img_source for technical purposes.

 logo_table <- team_df %>%
   gt() %>%
   gt_img_rows(columns = team_wordmark, height = 25) %>%
   gt_img_rows(columns = logo, img_source = "web", height = 30)

logo_table
team_wordmark team_abbr team_name team_id team_nick team_conf
SEA Seattle Seahawks 4600 Seahawks NFC
SF San Francisco 49ers 4500 49ers NFC
STL St. Louis Rams 2510 Rams NFC
TB Tampa Bay Buccaneers 4900 Buccaneers NFC
TEN Tennessee Titans 2100 Titans AFC
WAS Washington Commanders 5110 Commanders NFC

See also:

See also: